Skip to content

Conversation

@KatyaRyazantseva
Copy link
Contributor

@KatyaRyazantseva KatyaRyazantseva commented Jan 6, 2026

While running devnet-1, one of the client nodes reached its storage limit. A new ansible deployment failed when copying genesis files to the server due to insufficient storage. To restart the devnet, manual cleaning was required. This PR adds centralized data cleaning that runs before genesis generation when using the --cleanData flag.

The clean-data playbook can be called independently for cleaning up node data directories. It supports all deployment modes: site.yml, deploy-nodes.yml, and tag-based execution.

Scenarios tested on Lighthouse server

# Primary use case (via spin-node.sh)
NETWORK_DIR=local-devnet ./spin-node.sh --node lighthouse_0 --deploymentMode ansible --useRoot --cleanData

# Direct ansible with site.yml
./ansible-deploy.sh --node lighthouse_0 --clean-data

 #With deploy-nodes.yml
./ansible-deploy.sh --playbook deploy-nodes.yml --node lighthouse_0 --clean-data

# With tags
./ansible-deploy.sh --node lighthouse_0 --network-dir ansible-devnet --tags lighthouse --clean-data

# Independent cleaning (no deployment)
ansible-playbook -i ansible/inventory/hosts.yml ansible/playbooks/clean-data.yml \
    -e "genesis_dir=$(pwd)/ansible-devnet/genesis" \
    -e "node_names=lighthouse_0"

Copy link
Contributor

@ch4r10t33r ch4r10t33r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We appear to be double cleaning.
When clean_data=true is passed via site.yml, data gets cleaned:

  • First in clean-data.yml (step 1)
  • Then again in each role's tasks
    Should we consider using clean-data.yml for pre-deployment cleaning alone and remove role level cleaning when using site.yml?

msg: "Node key file {{ node_name }}.key not found in {{ genesis_dir }}"
when: not (node_key_stat.stat.exists | default(false))

- name: Check if node data directory has contents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract this into a shared task file under roles/common/tasks/clean-node-data.yml and use include_tasks in each role.

- name: Clean node data if requested
  include_tasks: "{{ playbook_dir }}/../roles/common/tasks/clean-node-data.yml"
  when: clean_data | default(false) | bool

This will modularize the code (instead of copying the same content across 5 different roles (which is likely to increase in future).

path: "{{ data_dir }}/{{ node_name }}"
state: absent
when: clean_data | default(false) | bool
when:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find task to check if a directory has contents before deletion is redundant. Ansible's file: state=absent is idempotent. It will succeed whether the directory exists, is empty, or has contents.


- name: Extract all node names
shell: |
yq eval '.validators[].name' {{ validator_config_file }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nit, but can we please add a check to validate if yq is installed before using it here?

@KatyaRyazantseva
Copy link
Contributor Author

We appear to be double cleaning. When clean_data=true is passed via site.yml, data gets cleaned:

  • First in clean-data.yml (step 1)
  • Then again in each role's tasks
    Should we consider using clean-data.yml for pre-deployment cleaning alone and remove role level cleaning when using site.yml?

This one was tricky. Both site.yml and deploy-nodes.yml work as independent playbooks and support the --cleanData flag, so both need cleaning. Correct me if I'm wrong.

When a server runs out of storage, cleaning must be the first remote task for all deployment playbooks. Otherwise, Ansible fails to create a temp folder on the server:

[ERROR]: Task failed: mkdir: cannot create directory /root/.ansible/tmp/ansible-tmp-1767979903.397822-19056-15485771047564: No space left on device

Currently, deploy-nodes.yml will fail with full storage. The common role runs first and tries to install packages. Then the individual client roles run tasks like "Extract node configuration" before they reach the cleaning tasks. All these tasks need temp space, so they fail before cleaning ever runs. I can add cleaning as the first task in deploy-nodes.yml (before the common role) and use a skip_role_cleaning flag when it's already done in site.yml. This avoids duplication, and we can delete cleaning on the role level.

@KatyaRyazantseva
Copy link
Contributor Author

Should we move helper files like deploy-single-node.yml into a separate folder (e.g., playbooks/utils/ or playbooks/helpers/) to make it clear which playbooks are top-level and can be run independently?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants